skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Farias, Vivek F"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Exploration is often necessary in online learning to maximize long-term rewards, but it comes at the cost of short-term “regret.” We study how this cost of exploration is shared across multiple groups. For example, in a clinical trial setting, patients who are assigned a suboptimal treatment effectively incur the cost of exploration. When patients are associated with natural groups on the basis of, say, race or age, it is natural to ask whether the cost of exploration borne by any single group is “fair.” So motivated, we introduce the “grouped” bandit model. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On one hand, we show that any regret-optimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most “disadvantaged” groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small “price of fairness.” We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups. This paper was accepted by David Simchi-Levi, data science. Funding: This work was supported by the National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 1727239]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01985 . 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  2. Thompson sampling has become a ubiquitous ap- proach to online decision problems with bandit feedback. The key algorithmic task for Thomp- son sampling is drawing a sample from the pos- terior of the optimal action. We propose an al- ternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance im- provements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehen- sive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoreti- cal perspective, we establish optimal regret guar- antees for TS-UCB for both the K-armed and linear bandit models. 
    more » « less
  3. We propose a tightening sequence of optimistic approximations to the Gittins index in “Optimistic Gittins Indices.” We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian multiarmed bandit problem. We prove that the use of these optimistic indices constitutes a regret optimal algorithm. Perhaps more interestingly, the use of even the loosest of these approximations appears to offer substantial performance improvements over state-of-the-art alternatives while incurring little to no additional computational overhead relative to the simplest of these alternatives. 
    more » « less
  4. We consider the problem of A-B testing when the impact of the treatment is marred by a large number of covariates. Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the precision of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting, where subjects arrive, and must be assigned, sequentially, with covariates drawn from an elliptical distribution with finite second moment. We further characterize the gain in precision afforded by optimized allocations relative to randomized allocations, and show that this gain grows large as the number of covariates grows. Our dynamic optimization framework admits several generalizations that incorporate important operational constraints such as the consideration of selection bias, budgets on allocations, and endogenous stopping times. In a set of numerical experiments, we demonstrate that our method simultaneously offers better statistical efficiency and less selection bias than state-of-the-art competing biased coin designs. 
    more » « less